Automatic Identification of Bengali Noun-Noun Compounds Using Random Forest
نویسندگان
چکیده
This paper presents a supervised machine learning approach that uses a machine learning algorithm called Random Forest for recognition of Bengali noun-noun compounds as multiword expression (MWE) from Bengali corpus. Our proposed approach to MWE recognition has two steps: (1) extraction of candidate multi-word expressions using Chunk information and various heuristic rules and (2) training the machine learning algorithm to recognize a candidate multi-word expression as Multi-word expression or not. A variety of association measures, syntactic and linguistic clues are used as features for identifying MWEs. The proposed system is tested on a Bengali corpus for identifying noun-noun compound MWEs from the corpus.
منابع مشابه
A Machine Learning Approach for the Identification of Bengali Noun-Noun Compound Multiword Expressions
This paper presents a machine learning approach for identification of Bengali multiword expressions (MWE) which are bigram nominal compounds. Our proposed approach has two steps: (1) candidate extraction using chunk information and various heuristic rules and (2) training the machine learning algorithm called Random Forest to classify the candidates into two groups: bigram nominal compound MWE ...
متن کاملIdentification of Noun-Noun (N-N) Collocations as Multi-Word Expressions in Bengali Corpus
Noun-Noun compounds, as a subset of Compound Nouns as well as Nominal Compounds play an important role in NLP applications like Machine Translation, Information Retrieval because of the token frequency, type frequency and their occurrence in the world’s languages. Recognition of MWEs requires deep or shallow syntactic preprocessing tools and large corpora. The problem is quite difficult in Beng...
متن کاملNoun Compound and Named Entity Recognition and their Usability in Keyphrase Extraction
We investigate how the automatic identification of noun compounds and named entities can contribute to keyphrase extraction and we also show how previously identified noun compounds affect named entity recognition and vice versa, how noun compound detection is supported by identified named entities. Our experiments demonstrate that already known noun compounds yield better performance in named ...
متن کاملAutomatic Interpretation of Noun Compounds Using WordNet Similarity
The paper introduces a method for interpreting novel noun compounds with semantic relations. The method is built around word similarity with pretagged noun compounds, based on WordNet::Similarity. Over 1,088 training instances and 1,081 test instances from the Wall Street Journal in the Penn Treebank, the proposed method was able to correctly classify 53.3% of the test noun compounds. We also i...
متن کاملHandling Of Prepositions In English To Bengali Machine Translation
The present study focuses on the lexical meanings of prepositions rather than on the thematic meanings because it is intended for use in an English-Bengali machine translation (MT) system, where the meaning of a lexical unit must be preserved in the target language, even though it may take a different syntactic form in the source and target languages. Bengali is the fifth language in the world ...
متن کامل